Training topic classifiers for conversational speech with limited data

نویسندگان

Rukmini Iyer

Jeff Z. Ma

Herbert Gish

Owen Kimball

چکیده

In this paper we demonstrate how automatically generated transcriptions can be used to develop an effective topic classification application. Two key contributions of our work are (a) investigating the impact of unsupervised transcriptions on topic classification where the transcription system has been trained with very limited amounts of data, and (b) demonstrating the use of mixture language models that significantly improve topic classification performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.

متن کامل

Confidence-Based Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech

We investigate the impact of automatic speech recognition errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF featureweighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice outputs using one reco...

متن کامل

Using conversational word bursts in spoken term detection

We describe a language independent word burst feature based on the structure of conversational speech that can be used to improve spoken term detection (STD) performance. Word burst refers to a phenomenon in conversational speech in which particular content words tend to occur in close proximity of each other as a byproduct of the topic under discussion. To take advantage of bursts, we describe...

متن کامل

A Boosting Approach to Topic Spotting on Subdialogues

We report the results of a study on topic spotting in conversational speech. Using a machine learning approach, we build classifiers that accept an audio file of conversational human speech as input, and output an estimate of the topic being discussed. Our methodology makes use of a wellknown corpus of transcribed and topic-labeled speech (the Switchboard corpus), and involves an interesting do...

متن کامل

Class-dependent Interpolation for Estimating Language Models from Multiple Text Sources

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Training topic classifiers for conversational speech with limited data

نویسندگان

چکیده

منابع مشابه

Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures

Confidence-Based Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech

Using conversational word bursts in spoken term detection

A Boosting Approach to Topic Spotting on Subdialogues

Class-dependent Interpolation for Estimating Language Models from Multiple Text Sources

عنوان ژورنال:

اشتراک گذاری